Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-70101] Revive ability to snapshot() the CertificateCredentials so they can be used on remote agents #391

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

jimklimov
Copy link

@jimklimov jimklimov commented Nov 21, 2022

As detailed in https://issues.jenkins.io/browse/JENKINS-70101 I've hit a problem using certificate-based user authentication in http-request-plugin (noticed as part of my work on https://issues.jenkins.io/browse/JENKINS-70000 in jenkinsci/http-request-plugin#120 but also reproduced with latest 1.16 release of that plugin).

After a week of reproductions and tracing across different plugins, the root of the problem crystallized as the use of SecretBytes in UploadedKeyStoreSource and effective lack of special snapshot() support. As a consequence, serialized copies of the key store used the same SecretBytes as on the Jenkins Controller, but being on a different JVM in the remote agent case, they used a different instance of the static KEY field with its own secret field. Although the SecretBytes are diligently copied (took a lot of effort to confirm that as seen in sibling branch https://github.com/jimklimov/credentials-plugin/tree/JENKINS-70101-trace), they are unreadable on the remote agent.

During investigation I found that credentials-plugin did have an implementation for the Certificate Credentials Snapshot Taker, but it was removed by 40d0b5c as part of deprecation of FileOnMasterKeyStoreSource subclass. Also an earlier history of the plugin involved the Secret class which effectively stores a base64 string and only encodes/decodes it on demand -- this was superseded by SecretBytes but some handling for conversion from older versions remained. This code proved to be a good starting point for fixing the problem, although not without some further work:

  • originally it consulted isSnapshotSource (forced true for UploadedKeyStoreSource subclass) and just returned the original credential if so;
  • even with that check disabled, the snapshot used SecretBytes for the copied key store and suffered the same fate - not usable on agent.

This PR adds tests (failing at first to confirm the problem), makes use of older codebases and new verified fixes:

  • UploadedKeyStoreSource.isSnapshotSource() depends on Channel.current() == null (so for remoting-related snapshots it is false and shortcuts to return this are not taken)
  • CertificateCredentialsSnapshotTaker was revived as a separate source file and standalone class, similar to existing UsernamePasswordCredentialsSnapshotTaker
  • UploadedKeyStoreSource was modified to handle again a Secret uploadedKeyStore field (and provide data from it if SecretBytes uploadedKeyStoreBytes are currently null). As part of that, transient and final modifiers on these fields had to go.
  • CertificateCredentialsSnapshotTaker was modified to create the new UploadedKeyStoreSource instance with Secret version rather than SecretBytes (as CredentialsSnapshotTaker docs stipulate, "all the details are captured within the credential")

Thanks to @mawinter69 and @slide for bright ideas and pointers, and general sympathy as I woed on the chat :)

Separately note that this PR adds tests using a separate JVM for the build agent to reproduce the problem. This code may be worth exporting into some JenkinsAgentRule if someone is up for it.

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

NOTE: I'll post commits in several phases, so CI has its chance to fail with the tests and show the original problem in logs.

@jimklimov jimklimov requested a review from a team as a code owner November 21, 2022 13:53
@jimklimov
Copy link
Author

Problem exposed in build logs, e.g. in https://github.com/jenkinsci/credentials-plugin/pull/391/checks?check_run_id=9617279731 :

[2022-11-21T14:02:34.841Z] Querying HTTPS with cert...
[2022-11-21T14:02:34.841Z] [Pipeline] httpRequest
[2022-11-21T14:02:34.841Z] HttpMethod: GET
[2022-11-21T14:02:34.841Z] URL: https://github.xcom/api/v3
[2022-11-21T14:02:34.841Z] Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-1
[2022-11-21T14:02:34.841Z] Using authentication: myCert
[2022-11-21T14:02:34.841Z] [Pipeline] }
[2022-11-21T14:02:34.841Z] [Pipeline] // node
[2022-11-21T14:02:34.841Z] [Pipeline] End of Pipeline
[2022-11-21T14:02:34.841Z] javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
[2022-11-21T14:02:34.841Z] 	at java.base/com.sun.crypto.provider.CipherCore.unpad(CipherCore.java:975)
[2022-11-21T14:02:34.841Z] 	at java.base/com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1056)
[2022-11-21T14:02:34.841Z] 	at java.base/com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853)
[2022-11-21T14:02:34.841Z] 	at java.base/com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
[2022-11-21T14:02:34.841Z] 	at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2202)
[2022-11-21T14:02:34.841Z] 	at com.cloudbees.plugins.credentials.SecretBytes.getPlainData(SecretBytes.java:140)
[2022-11-21T14:02:34.841Z] Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to worker
[2022-11-21T14:02:34.841Z] 		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1784)
[2022-11-21T14:02:34.841Z] 		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
[2022-11-21T14:02:34.841Z] 		at hudson.remoting.Channel.call(Channel.java:1000)
[2022-11-21T14:02:34.841Z] 		at jenkins.plugins.http_request.HttpRequestStep$Execution.run(HttpRequestStep.java:404)
[2022-11-21T14:02:34.841Z] 		at jenkins.plugins.http_request.HttpRequestStep$Execution.run(HttpRequestStep.java:383)
[2022-11-21T14:02:34.841Z] 		at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
[2022-11-21T14:02:34.841Z] 		at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[2022-11-21T14:02:34.841Z] Caused: java.lang.Error
[2022-11-21T14:02:34.841Z] 	at com.cloudbees.plugins.credentials.SecretBytes.getPlainData(SecretBytes.java:142)
[2022-11-21T14:02:34.841Z] 	at com.cloudbees.plugins.credentials.SecretBytes.getPlainData(SecretBytes.java:233)
Output truncated.

Will post the fix commits now :)

@jimklimov
Copy link
Author

Tests fixed. Now posting the final touch, to make them quieter. This passes - and good to merge (on my side) :)

@slide
Copy link
Member

slide commented Nov 21, 2022

@jenkinsci/core-security-review Should probably review this.

@jimklimov
Copy link
Author

jimklimov commented Nov 21, 2022

Yep. In particular, I wonder if it is okay to leave it like this (possibly storing keystore data "plaintext" in a Secret which still needs a password to read, vs. SecretBytes all the time - only in cases where UploadedKeyStoreSource was made by constructor with Secret in the first place, e.g. via snapshot() in now-existing codebase; other cases gotta have disappeared since 2019).

I have a further effort to force conversion from Secret to SecretBytes on first read, if we are in context where Channel is not null and JenkinsJVM indicates running not on a controller. It complicates things a bit, does not seem to break anything, and I have no idea if it actually takes effect (never saw the logs it is supposed to emit in my local branch - not as MVN log, not as pipeline log, not as agent log...) and also there is some indefinite timeframe where the Secret is used by the class instance before the first read. Probably someone with debug privileges can un-private the field to read it, but that's a pretty contrived attack and probably SecretBytes are not much safer in that situation.

@jimklimov
Copy link
Author

jimklimov commented Nov 28, 2022

@jenkinsci/core-security-review Gentle bump :)

@yaroslavafenkin
Copy link
Contributor

Hey @jimklimov, we've noticed the issue. We'll try to review it if/when time permits.

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a security standpoint, I don't think there is any particular issue with taking snapshots of CertificateCredentialsImpl, so you can consider this approved by the Jenkins security team.

My best guess as to what happened is that the original author of 40d0b5c based their fix on a similar fix in ssh-credentials (jenkinsci/ssh-credentials-plugin@18b3121). In that plugin, the security fix eliminated the only remaining key source that required snapshot support so all related code was (correctly) deleted. In this plugin, the situation appears more complicated.

I only looked at the issue and the fix briefly, but I am a little confused about the behavior before the SECURITY-1322 fix though. Your PR fixes the issue by using Secret for the data instead of SecretBytes. This makes me think that the relevant functionality has been broken ever since #64, which IIUC changed CredentialsSnapshotTakerImpl from using Secret to SecretBytes, and appears to use SecretBytes directly in the credentials it returns from snapshot. Any idea if that is correct?

If so, I would be inclined to effectively revert #64 and switch back to Secret in all cases so that we do not need to (re)introduce CertificateCredentialsSnapshotTaker and all of the supporting code. If not, then I don't understand why things would be broken before this PR (but there is a good chance I am misunderstanding something). Also, why doesn't FileCredentialsImpl, which also uses SecretBytes, need a custom CredentialsSnapshotTaker? Maybe it does, but it's not a common enough issue for anyone to have tried to fix it yet (or maybe I just didn't notice it in a quick search)?

The docs here makes me think that maybe this code in http_request should instead be modified somehow, but I'm not sure what the preferred alternative is. Resolve the keystore and password directly there and just send them as Secrets instead of the entire Credentials object?

As a consequence, serialized copies of the key store used the same SecretBytes as on the Jenkins Controller, but being on a different JVM in the remote agent case, they used a different instance of the static KEY field with its own secret field.

This is a bit of a side note, what makes Secret special? Is there something in remoting that treats it differently? Secret.KEY is comparable to SecretBytes.KEY, so I don't really understand what the difference would be here.

@jimklimov
Copy link
Author

jimklimov commented Dec 7, 2022

Thanks for the diligently annotated response :)

Regarding FileCredentialsImpl - I did not use it so maybe the issue is there, just unnoticed. Probably multi-agent tests like those I proposed here can be used to check if it works or is similarly broken.

  • Just in case for future readers, this is a different class (and plugin) from FileOnMasterKeyStoreSource deprecated by SECURITY-1322 fixes ;)

Regarding "correctness" of the fix: it seems that

  • The Secret is easier to read and clone, since it effectively carries the plaintext data as a Base64 String value which can be encoded/decoded for serialization to disk or some such - and the KEY is only used there. Copying it across the remoting boundary is just about copying the (plaintext) string field of the class instance, I guess.
  • On the other hand, SecretBytes stores an array of bytes with encoded data and salt, and relies on its KEY (static so shared by all instances in one JVM - and with remoting there are two JVMs). When a SecretBytes instance is cloned on the same (controller) JVM, the copy is decodable. When it passes the remoting boundary, the actual byte arrays are diligently copied, but the key in that JVM can not decipher it. And there are no provisions to set the KEY in the remote agents (it is generated for a runtime of JVM, not the master.key or some such) - it could be a different way to solve the issue, with a trade-off depending on how much we trust our remote agents.
  • I agree that the Secret is potentially less secure since the plaintext content is easy to access by debugger and API. In case of certificate key store (bytes serialized into a string and back) it is IMHO less of a problem, since a separate password is needed to read the store (and pedantically, same or other password for each private key... not sure if current code allows different passwords here, just did not dig in that direction).

According to my tracing, the referenced code of http_request runs on the controller, as the class is initialized for the step, and so the original credential and its snapshot are readable there. By the time it gets to the actual httpExecute and specifically credentials.getKeyStore() in https://github.com/jenkinsci/http-request-plugin/blob/master/src/main/java/jenkins/plugins/http_request/auth/CertificateAuthentication.java#L34 - that part of code runs on the agent and fails to decipher the exact replica of original SecretBytes in that remote JVM with its different key, but succeeds to make use of Secret.

As I mentioned above, in another branch (and draft PR #393 for review convenience) I have additional tracing and a commit 537d44d to convert from Secret to SecretBytes upon first read in remote JVM. Pedantically it should be safer and seems to not hurt (test-wise), but I think it is sort of moot given that some percentage of the time that data is "less protected" anyway, and for keystores needs a password to read anyway. Still, may be worth adding for additional peace of mind, especially if some use-cases would be not about short-lived storage of this data (with http-request-plugin, it lives as long as each separate step is being processed: snapshot'ed in preparation of the context, handles request/response, and then discarded; it may be longer-lived in legacy freestyle pipelines though).

I have no strong opinion about reverting the older change (from Secret to SecretBytes), but still would be reluctant to do so:

  • in general case, SecretBytes do seem safer vs. memory-poking
  • current solution is around for a few years and maybe something somewhere (maybe in someone's internal JSLs) relies on code and fields being as they are. I've seen pipelines fiddling with field visibility via reflection here and there, to just get their job done, albeit quickly and dirtily, so while this is not something we want to bother "supporting", got no good reason for breaking that either :)

Finally, regarding the idea to effectively redefine http-request-plugin::CertificateAuthentication so we would extract needed credential fields - it does sound viable. However on one hand it would be more work (compared to me having the solution I need with this PR, and another PR there jenkinsci/http-request-plugin#120 for better support of trusted CAs), and so something to reimplement and retest for no apparent gain. Also not sure if it is more/less safe for the credential payload that we are trying to protect; potentially the getters used now might deny privy access to certain agents (e.g. in a paranoid subclass), and getting info earlier would forfeit such security. On another, the root cause of my stumble there was that snapshotting in this plugin happened to be dysfunctional if remoting got involved. So a different implementation there would be a workaround, and potential other broken use-cases would need to do similar or invent their new wheel. Although the referenced docs are a bit ambiguous: "if you need to send a credential - use snapshot()" (here this one is broken so needs a fix proposed by this PR) "...but better avoid sending it at all" (your proposal).

Maybe a similar idea is viable - to store the name of credential and request it (from agent if needed) as that doc suggests. But in case of long-lived HttpRequestExecution objects (freestyle?) this leaves a time gap for credential to change or disappear. A snapshot protects from that.

So it likely makes sense to log an RFE issue in http-request-plugin to rearchitect this part, but I was not on a crusade to make all of the world better - got dayjob tasks to tend to, so have to be satisfied by getting it working and leave refactoring to others :) But since the root-cause was broken snapshot() ability here, well... it is a bug that should be fixed anyway. Work with that plugin just led me to the discovery :)

@dwnusbaum
Copy link
Member

dwnusbaum commented Dec 7, 2022

The Secret is easier to read and clone, since it effectively carries the plaintext data as a Base64 String value which can be encoded/decoded for serialization to disk or some such - and the KEY is only used there. Copying it across the remoting boundary is just about copying the (plaintext) string field of the class instance, I guess.

Ah, yeah I did not realize that Secret.value is just the plaintext, whereas SecretBytes.value is the ciphertext (this comment seems wrong looking at getPlaintextData and getEncryptedData()). I guess that does mean that #64 is what originally broke the ability to use these credentials remotely. Another fix could be to make SecretBytes hold the unencrypted value in memory, comparable to Secret, which I think would prevent us from needing to add all of the snapshot-related code back, and it would allow other credentials types that use SecretBytes to work on agents without special snapshot support.

@jimklimov
Copy link
Author

jimklimov commented Dec 7, 2022

At least that's an option to keep in mind. If it bites someone else. This use-case issue took 3? years to be noticed intently enough...

I am not certain about reasoning behind SecretBytes but it does seem to play safe about the original data (e.g. generic plaintext contents like passwords) especially if it is not further protected like a keystore is. A memory-dump snapshot of those is not immediately readable, unlike a Secret, which probably is the point of this class. Maybe a decent solution could be to craft a special (fields? subclass?) sort of SecretBytes that is less protected, when explicitly requesting a snapshot() (add the method to API and call it from other plugins?)... so that e.g. a null KEY would be used and a JVM ID would be recorded, and the first use on a different JVM would re-encrypt the data safely again. In that case general use remains fully safe, snapshot is a minor risk trade-off, and nuances would be in the SecretBytes class (and new API calls instead of default clone(), or some unKey() after a clone(), to call from snapshot() implementations of credential classes and other consumers).

@dwnusbaum
Copy link
Member

A memory-dump snapshot of those is not immediately readable, unlike a Secret, which probably is the point of this class.

This comment and the reply make me think that the main goals were code deduplication and to avoid the need to Base64-encode raw byte[] secrets that may not be text when storing them in a Secret. I do not think that it was intended to offer security improvements over Secret, and I do not think that it does.

@jimklimov
Copy link
Author

Ok, I'll trust your judgement on this then - you're the security team and know more context overall :)

Now, what about this-here PR? To me and my tasks it brings "immediate relief" for an issue at hand; you seem to be in favor of deeper design changes... Can this one be merged as proposed and the rest iterated separately? :)

@dwnusbaum
Copy link
Member

dwnusbaum commented Dec 9, 2022

Now, what about this-here PR? To me and my tasks it brings "immediate relief" for an issue at hand; you seem to be in favor of deeper design changes..

From my perspective, this general goal of this PR and the approach seem fine. My personal opinion is that #391 (comment) might be simpler and help us avoid related issues in the long run, but that's just my opinion.

I am not an active maintainer of this plugin, so I'll request a review and see if anyone who is responsible for it wants to review it (sorry, I should have done this after my initial review). I have not reviewed the implementation carefully, but generally speaking the tests look more complicated than I would expect. I don't think we want to take on dependencies on http_request or job-dsl, even in test scope. I don't think it makes sense to test the behavior of withCredentials here, and I don't think we want a credentials-binding dependency either. I think you could probably use a MasterToSlaveCallable (or a basic Step if you must use Pipeline) to drastically simplify the tests.

In case you did not already know, you can always download a build of the plugin with your PR from the "Incremental" GitHub check to use while you wait for a maintainer. The latest build as of the time of writing this comment can be found here: https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/plugins/credentials/1235.v3213f694ca_1c/.

@dwnusbaum dwnusbaum requested a review from a team December 9, 2022 14:54
Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments related to the tests. I only took a quick look, so I might have misunderstood some things.

return agentUsable;
}

Boolean setupAgent() throws IOException, InterruptedException, OutOfMemoryError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use JenkinsRule.createOnlineSlave and delete this method, or is there a significant difference that I missed?

Copy link
Author

@jimklimov jimklimov Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily, was not aware about it. Probably none of the sources I grepped into (nor IDEA highlighting) hit the keywords I used, which alas remains the best method of random ad-hoc learning so far :\

So even if I did reinvent a wheel, looks like mine is quite round at least :)

SystemCredentialsProvider.getInstance().save();
}

String cpsScriptCredentialTestImports() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have to import this stuff, you should write a @TestExtension Step instead (or just a MasterToSlaveCallable, since I don't think Pipeline is required to test this issue).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, now that I know the keywords, I'll try to find examples for inspiration :)

Comment on lines +314 to +316
"node {\n" +
cpsScriptCertCredentialTestScriptedPipeline("CONTROLLER NODE") +
"}\n";
Copy link
Member

@dwnusbaum dwnusbaum Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Groovy code in a Pipeline always executes on the controller. Wrapping everything in node does not actually change the behavior related to the issue you are seeing - this test, testCertKeyStoreReadableOnController, and testCertKeyStoreReadableOnNodeRemote should all have the same behavior as far as the credential is concerned regardless of your changes (unless I missed something in the Pipeline script that actually makes use of the agent). To actually make use of the FilePath or Launcher provided by the node you would need to write a Step.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case it was more to rule out that nothing pops up with plain code and an unnamed node. It certainly did differ (original problem) for "named node" (or agent with label in declarative), with part of the plugin handling running on the remote agent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in these three test cases, the nodes are not actually used by the credential-related code in the Pipelines. Regardless, I would replace your new tests with the minimal test from 4ee16f3.

@dwnusbaum dwnusbaum requested a review from a team December 9, 2022 15:04
@dwnusbaum dwnusbaum changed the title Revive ability to snapshot() the CertificateCredentials so they can be used on remote agents [JENKINS-70101] Revive ability to snapshot() the CertificateCredentials so they can be used on remote agents Dec 9, 2022
@dwnusbaum
Copy link
Member

dwnusbaum commented Dec 9, 2022

Anyway, since I am trying to be helpful and not just nitpicky (😄), take a look at master...dwnusbaum:credentials-plugin:JENKINS-70101 to see what a fix that changes SecretBytes to use unencrypted data in memory looks like. Whether maintainers would prefer that type of fix or yours, I do not know.

That said, my test in that branch is minimal and reproduces the javax.crypto.BadPaddingException: Given final block not properly padded... issue described in JENKINS-70101 without the fix, and passes with the fix. I would probably revert your pom.xml changes, delete CredentialsInPipelineTest.java, and apply 4ee16f3 to your branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants